X3-Miner: mining patterns from XML database
نویسندگان
چکیده
An XML enabled framework for representation of association rules in databases was first presented in [4]. In Frequent Structure Mining (FSM), one of the popular approaches is to use graph matching that use data structures such as the adjacency matrix [7] or adjacency list [8]. Another approach represents semistructured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [10]. However, with XML, mining association rules is faced with more challenges due to the inherent flexibilities in both structure and semantics, such as: 1) more complicated hierarchical data structure; 2) ordered data context; and 3) much bigger data size. To tackle these challenges, we propose an approach X3-Miner that efficiently extracts patterns from a large XML data set, and overcomes the challenges by: (1) exploring the use of a model validating approach in deducing the number of candidates generated by taking into account of the semantics embedded in the tree-like structure in an XML database and obtain only valid candidates out of the XML database; (2) minimising I/O overhead by intersecting XML database with the frequent 1-itemset. This results in a frequent 1-itemset XML tree. The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)itemsets. (3) extending the notion of string representation of a tree structure proposed in [10] to xstring for describing an XML document without loss of both structure and semantics. Such an extension enables an easier traversal of the treestructured XML data during our model-validating candidate generation. Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data.
منابع مشابه
A Framework for Efficient Association Rule Mining in XML Data
In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document are first preprocessed to transform to either an Indexed XML Tree (IX-tree) or Multi-relational Databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection and AR mining. Concepts that...
متن کاملOn Efficient and Effective Association Rule Mining from XML Data
In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently and effectively. In XAR-Miner, raw XML data are first transformed to either an Indexed Content Tree (IX-tree) or Multi-relational databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection in the AR mining. Concepts that are re...
متن کاملMining security events in a distributed agent society
In distributed agent architecture, tasks are performed on multiple computers which are sometimes spread across different locations. While it is important to collect security critical sensory information from the agent society, it is equally important to analyze and report such security events in a precise and useful manner. Data mining techniques are found to be very efficient in the generation...
متن کاملPXML-Miner: A Projection-Based Interesting XML Rule Mining Technique
In recent times, the mining of association rules from XML databases has received attention because of its wide applicability and flexibility. Many mining methods have been proposed. Because of the inherent flexibility of the structures and the semantics of the documents, however, these methods are challenging to use. In order to accomplish the mining, an XML document must first be converted int...
متن کاملProgressive CFM-Miner: An Algorithm to Mine CFM - Sequential Patterns from a Progressive Database
Sequential pattern mining is a vital data mining task to discover the frequently occurring patterns in sequence databases. As databases develop, the problem of maintaining sequential patterns over an extensively long period of time turn into essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005